Tutorial: Key point detection

A key point problem is a variant of image regression in which a "key point" refers to a specific location represented in an image.

In this tutorial, we'll be looking for the centre of the person's face in each image, predicting two values for each image: the row and column of the face centre. The full fastai tutorial can be found in their notebook manual:

This problem can then be expanded to find the centroid of an asteroid!

Setup the GPU

Before starting, you will need to set colab to use the GPU:

Install Fastai libraries

The fastai2 library is not pre-installed in colab so we first need to pip install it. Using "!" at the beginning of a cell in a Jupyter notebook runs a shell command inside the notebook, and installs the library directly to the colab virtual machine. You only need to rerun this when restarting the kernel.

In [ ]:
!pip3 install fastai2
Collecting fastai2
  Downloading https://files.pythonhosted.org/packages/26/4f/0f61bb0d376eb47c20430639bac4946ca0cffcd7e693fb86698656324f2d/fastai2-0.0.17-py3-none-any.whl (190kB)
     |████████████████████████████████| 194kB 3.4MB/s 
Requirement already satisfied: spacy in /usr/local/lib/python3.6/dist-packages (from fastai2) (2.2.4)
Requirement already satisfied: torch>=1.3.0 in /usr/local/lib/python3.6/dist-packages (from fastai2) (1.5.0+cu101)
Requirement already satisfied: pillow in /usr/local/lib/python3.6/dist-packages (from fastai2) (7.0.0)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from fastai2) (0.22.2.post1)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from fastai2) (3.2.1)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from fastai2) (2.23.0)
Requirement already satisfied: fastprogress>=0.1.22 in /usr/local/lib/python3.6/dist-packages (from fastai2) (0.2.3)
Requirement already satisfied: torchvision>=0.5 in /usr/local/lib/python3.6/dist-packages (from fastai2) (0.6.0+cu101)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from fastai2) (1.4.1)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from fastai2) (3.13)
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from fastai2) (1.0.3)
Collecting fastcore
  Downloading https://files.pythonhosted.org/packages/dd/f3/8cd2e1ed981b0ddbe4d56e5d44f52c9e56d27ac7d53c30abb534d10c82c2/fastcore-0.1.17-py3-none-any.whl
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.0.2)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.1.3)
Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (7.4.0)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (0.6.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.0.2)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.0.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (2.0.3)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (46.1.3)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (3.0.2)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.18.4)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (0.4.1)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (4.41.1)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch>=1.3.0->fastai2) (0.16.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->fastai2) (0.14.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (1.2.0)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (2020.4.5.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->fastai2) (2018.9)
Requirement already satisfied: dataclasses>='0.7'; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from fastcore->fastai2) (0.7)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy->fastai2) (1.6.0)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10->matplotlib->fastai2) (1.12.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy->fastai2) (3.1.0)
Installing collected packages: fastcore, fastai2
Successfully installed fastai2-0.0.17 fastcore-0.1.17

Any machine learning task that involves understading images falls into the area of Computer Vision. In fastai, all the modules related to computer vision can be found under fastai.vision.

In [ ]:
from fastai2 import *
from fastai2.vision.all import *
In [ ]:
import numpy as np
import pandas as pd

Download & Access Data

We will use a dataset hosted by fastai for this tutorial which contains images (.jpg files) and the coordinates of the centre of the faces on each image in their corresponding pose.txt file. These are grouped into 24 different directories, each containing independent photographs of different people.

In [ ]:
path = untar_data(URLs.BIWI_HEAD_POSE)
In [ ]:
path.ls()
Out[ ]:
(#50) [Path('/root/.fastai/data/biwi_head_pose/24'),Path('/root/.fastai/data/biwi_head_pose/18'),Path('/root/.fastai/data/biwi_head_pose/17'),Path('/root/.fastai/data/biwi_head_pose/16.obj'),Path('/root/.fastai/data/biwi_head_pose/20.obj'),Path('/root/.fastai/data/biwi_head_pose/09'),Path('/root/.fastai/data/biwi_head_pose/09.obj'),Path('/root/.fastai/data/biwi_head_pose/04.obj'),Path('/root/.fastai/data/biwi_head_pose/02.obj'),Path('/root/.fastai/data/biwi_head_pose/20')...]
In [ ]:
Path.BASE_PATH = path
In [ ]:
path.ls().sorted()
Out[ ]:
(#50) [Path('01'),Path('01.obj'),Path('02'),Path('02.obj'),Path('03'),Path('03.obj'),Path('04'),Path('04.obj'),Path('05'),Path('05.obj')...]
In [ ]:
(path/'01').ls().sorted()
Out[ ]:
(#1000) [Path('01/depth.cal'),Path('01/frame_00003_pose.txt'),Path('01/frame_00003_rgb.jpg'),Path('01/frame_00004_pose.txt'),Path('01/frame_00004_rgb.jpg'),Path('01/frame_00005_pose.txt'),Path('01/frame_00005_rgb.jpg'),Path('01/frame_00006_pose.txt'),Path('01/frame_00006_rgb.jpg'),Path('01/frame_00007_pose.txt')...]
In [ ]:
img_files = get_image_files(path)
In [ ]:
im = PILImage.create(img_files[0])
im.shape
Out[ ]:
(480, 640)
In [ ]:
im.to_thumb(254)
Out[ ]:

Preprocess Data

For each image, we need to be able to load in the location of the centre of the head.

In [ ]:
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')
In [ ]:
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_ctr(f):
    ctr = np.genfromtxt(img2pose(f), skip_header=3)
    c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
    c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
    return tensor([c1,c2])

For fastai2, we need to load each image (ImageBlock) and head coordinate (PointBlock) into a specific DataBlock format:

  • get_y calls the above function for each image (as loaded in from get_items) to load in the image and associated coordinates.
  • splitter specifies how to split the data into a training set and a validation set. As we want our model to be tested on an unseen validation set, we choose all the images from the 13th directory (containing photographs of unseen people).
  • batch_tfms defines any transformations we want to do after the images have been batched for training (this is more efficient as it means the transforms are performed on the GPU). Here, we can normalise the data, resize the images and perform data augmentation techniques to try and help the model generalise well:
In [ ]:
batch_tfms = [*aug_transforms(size=(240,320)), Normalize.from_stats(*imagenet_stats)]

dblock = DataBlock(blocks    = (ImageBlock, PointBlock),
                   get_items = get_image_files,
                   get_y     = get_ctr,
                   splitter  = FuncSplitter(lambda o: o.parent.name=='13'),
                   batch_tfms= batch_tfms)
In [ ]:
# debugging the DataBlock
#dblock.summary('')

We then call the DataBlock on our image set at path and choose a batch size. Higher batch sizes are more computationally expensive but lead to faster convergence: if your images are too large, you will have to choose a smaller batch size, so we resized them above.

In [ ]:
dls = dblock.dataloaders(path, bs=16)
In [ ]:
print("Number of images in the training vs validation sets: {}, {}".format(dls.train.n, dls.valid.n))
Number of images in the training vs validation sets: 15193, 485

Check whether the data looks ok before training:

In [ ]:
dls.show_batch(max_n=9, figsize=(10,10))

What are the dimensions of your dataset?

  • The Images are defined by the batch size, number of channels (e.g. rgb), and image dimension
  • The point coordinates are defined by the batch size and the fact that there are 2 coordinates per image (x,y)
In [ ]:
xb,yb = dls.one_batch()
xb.shape,yb.shape
Out[ ]:
(torch.Size([16, 3, 480, 640]), torch.Size([16, 1, 2]))
In [ ]:
im = image2tensor(Image.open(img_files[0]))
_,axs = subplots(1,3)
for i,ax,color in zip(im,axs,('Reds','Greens','Blues')):
    show_image(255-i, ax=ax, cmap=color)

Training a Model

As we are working with images, we will make use of transfer learning to improve our accuracy and convergence speed by using a pre-trained model.

Here, we make use of ResNet, a classic neural network used as a backbone for many computer vision tasks. This model was the winner of ImageNet challenge in 2015. The ImageNet challenge was a classification task for thousands of image categories, and millions of images.

image.png

image.png

With transfer learning, we can take this pre-trained classification model and use it for a task that is different to what it was originally trained for: our regression problem.

We keep and freeze the first layers of the network, which have been trained on the large imagenet dataset. These early layers have already learnt about what edges and colours look like. Later layers of this network have learnt the specific characteristics of e.g. dog breeds, that we are not interested in, so we remove these layers and replace them with random weights.

Then, we only optimise these last few layers, which are tailored towards our smaller specific dataset. This approach is faster and requires much less data than learning from scratch.

image.png

image.png

image.png

image.png

First, we set up a learner object:

  • We use a convolutional neural network (cnn) with weights pre-trained on ImageNet data (from resnet). This function has been designed for transfer learning and initialises the weights on the latter layers (the head) randomly.

    • When choosing the number of layers to use from resnet: always start with a lower number of layers when exploring the training as it is much faster. Increasing the number of layers might not lead to better results, and increases the likelihood of overfitting.
  • Since we're predicting a continuous number, rather than a category, we have to tell fastai what range our target has, using the y_range parameter.

  • We can specify which metrics we are interested in for evaluating how the model performs on the validation dataset. Available built in metrics can be found in: https://github.com/fastai/fastai2/blob/master/fastai2/metrics.py.

In [ ]:
learn = cnn_learner(dls, resnet18, y_range=(-1,1), metrics=[mse, mae]) # try resnet18, resnet34, resnet50
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/checkpoints/resnet18-5c106cde.pth

Here, the default loss function is the mean square error (typical for regression problems).

In [ ]:
dls.loss_func
Out[ ]:
FlattenedLoss of MSELoss()

Choosing your learning rate:

The learning rate is often one of the most important parameters as it is used to define the step size in the optimisation:

  • Learning rate too low: very small steps: large number of epochs needed to converge
  • Learning rate too high: large steps: loss starts to increase and you diverge
  • Don't want to choose the minimum: this is the point at which the model is not learning any more, it is not improving...

Good choices:

  • The steepest gradient: the learning rate for which the model is improving the most
  • Rule of thumb: an order of magnitude less than the minimum

Human in the loop: the shape of the learning rate finder might be such that the returned suggested values are not the best for your problem...

In [ ]:
suggested_lrs = learn.lr_find()
suggested_lrs
Out[ ]:
SuggestedLRs(lr_min=7.585775847473997e-08, lr_steep=6.309573450380412e-07)
In [ ]:
# choose learning rate based on lr_finder

lr = suggested_lrs.lr_steep

Fine tuning is the fastai helper for transfer learning. First, we have one epoch where we initalise the weights of the pre-trained ResNet model and train the whole network once. Then we freeze these layers, and only train the last layers of the network which are specific to our data.

In [ ]:
learn.fine_tune(epochs=3, base_lr=lr)
epoch train_loss valid_loss mse mae time
0 0.402091 0.231889 0.231889 0.404809 02:22
epoch train_loss valid_loss mse mae time
0 0.383483 0.174924 0.174924 0.348074 02:48
1 0.381297 0.196985 0.196985 0.367655 02:47
2 0.378493 0.135382 0.135382 0.303356 02:46
In [ ]:
# choose learning rate based on lr_finder

lr = 5e-3

Fastai uses callbacks to help you customise your training: here we want to save which is the best model through training, and reload that at the end.

In [ ]:
# define callback to save best model
cb = SaveModelCallback(fname="best")
In [ ]:
learn.fine_tune(epochs=2, base_lr=lr, cbs=[cb])
epoch train_loss valid_loss mse mae time
0 0.025444 0.004414 0.004414 0.056987 07:20
epoch train_loss valid_loss mse mae time
0 0.005779 0.004249 0.004249 0.054545 09:57
1 0.002271 0.000936 0.000936 0.028062 09:55
In [ ]:
print("Best MSE (loss): {}".format(cb.best))
print("Prediction error: {:.4}%".format(np.sqrt(cb.best)*100))
Best MSE (loss): 0.0009356908849440515
Prediction error: 3.059%

Beware of overfitting, when the validation losses start to increase.

In [ ]:
learn.recorder.plot_loss()

Fastai implements a learning rate scheme called "fit one cycle", where the learning rates are varied over the course of training to improve convergence. This includes a warm-up stage, where the learning rates are smaller at the beginning of training, before building up to a maximum learning rate and then decaying towards the end. For best results, these smaller learning rates at the end of training should not be wasted on epochs which are overfitting.

In [ ]:
learn.recorder.plot_sched()
In [ ]:
learn.show_results(ds_idx=1) #figsize=(6,8))

Understanding the results

Baseline

Will using deep learning lead to a measureable improvement on more traditional, or more easily interpretable, methods?

Here we calculate predictive accuracy obtained using a dummy baseline, in which we predict the centre of the face is the centre of the image, to compare to our ML results.

Retrieve true coordinates of centre of face from the validation data set:

In [ ]:
valid_img_files = get_image_files((path/"13"))
true = np.array([get_ctr(f).numpy() for f in valid_img_files])
true[:5]
Out[ ]:
array([[365.94308, 223.94508],
       [337.9678 , 230.14499],
       [375.12286, 245.0329 ],
       [332.9224 , 260.44562],
       [343.25586, 222.0911 ]], dtype=float32)

Predictive values as the central pixel of each image:

In [ ]:
im = PILImage.create(valid_img_files[0])
xshape, yshape = im.shape[0], im.shape[1]
xshape, yshape
Out[ ]:
(480, 640)
In [ ]:
pred = np.vstack((np.array([0.5 * xshape]*len(true)), np.array([0.5 * yshape]*len(true)))).T
pred[:5]
Out[ ]:
array([[240., 320.],
       [240., 320.],
       [240., 320.],
       [240., 320.],
       [240., 320.]])
In [ ]:
# normalise for comparable results to ML
mse = np.average(np.average((true/yshape - pred/yshape) ** 2, axis=0))

print("Baseline best MSE: {}".format(mse))
print("Prediction error: {:.4}%".format(np.sqrt(mse)*100))
Baseline best MSE: 0.024298826210708977
Prediction error: 15.59%

Plot Top Losses

Plot the cases where the model performs the worst (and best) to try and understanding what the model is deficient in, and successful at. Is it cheating? Or learning as we would expect it to?

In [ ]:
interp = Interpretation.from_learner(learn)
In [ ]:
def fplot_top_losses(interp, dls, k=4, largest=True):
    """
    worst cases: largest = True
    best cases: largest = False

    for idx in worst.indices:
        dls.show(dls.valid.dataset[idx])
    """
    # retrieve top losses and indices
    losses = interp.top_losses(k, largest)
    
    # get corresponding image files
    imgs = [dls.valid.items[i] for i in losses.indices]
    
    # plot image files
    dls.test_dl([PILImage.create(i) for i in imgs]).show_batch()
    
    return imgs
In [ ]:
worst = fplot_top_losses(interp, dls, k=4, largest=True)
In [ ]:
best = fplot_top_losses(interp, dls, k=4, largest=False)

Class Activation Maps

Extension activity!

Class Activation Mapping (or CAM) is a common techique in computer vision for "explaianble AI". It maps the importance of each input (pixel) with respect to the changes of the outputs activations. it's normally visualised as a heatmap over the image to highlight the parts most important for the prediction.

To access the activations inside the model while it's training, we need to use PyTorch hooks. For the full tutorial and explanation, see:

In [ ]:
class Hook():
    def __init__(self, m):
        self.hook = m.register_forward_hook(self.hook_func)   
    def hook_func(self, m, i, o): self.stored = o.detach().clone()
    def __enter__(self, *args): return self
    def __exit__(self, *args): self.hook.remove()
In [ ]:
def fshow_cam(learn, dls, x):
    """
    plot class activation map
    """
    with Hook(learn.model[0]) as hook:
        with torch.no_grad(): output = learn.model.eval()(x.cuda())
        act = hook.stored[0]

        cam_map = torch.einsum('ck,kij->cij', learn.model[1][-2].weight, act)

        x_dec = TensorImage(dls.valid.decode((x,))[0][0])
        fig,ax = plt.subplots()
        x_dec.show(ctx=ax)
        im = ax.imshow(cam_map[1].detach().cpu(), alpha=0.4, extent=(0,x.shape[3],x.shape[2],0),
                        interpolation='bilinear', cmap='jet')
        fig.colorbar(im)
        plt.show()
In [ ]:
for img in worst:
    im = PILImage.create(img)
    x, = first(dls.test_dl([im]))
    fshow_cam(learn, dls, x)
In [ ]:
for img in best:
    im = PILImage.create(img)
    x, = first(dls.test_dl([im]))
    fshow_cam(learn, dls, x)
In [ ]: